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ABSTRACT 

Diagnostic testing can provide specific information 
about student skills as a decision-making aid to teachers in 
prescribing instruction, identifying needs for remediation, 
determining effective instructional materials and methods, and 
ultimately, improving student learning. Diagnostic testing, as viewed 
here, includes individual and group assessment of students' skills in 
specified cognitive domains. A methodology is presented for designing 
diagnostic tests which assess the extent of student learning and are 
sensit 4 ♦ to sources of difficulty within a skill or context area. 
This 5 3p methodology for diagnostic test development includes: (1) 
Developing a skill blueprint including a general description of the 
objective or skill, a sample item, content limits, and response 
limits; (2) Specifying the skill map including sub-skills or simpler 
contexts which students should master enroute to the desired skill 
under assessment; (3) Formulating test items that match 
specifications and follow conventions for sound item-writing; (4) 
Reviewing test items to insure match to specifications and technical 
quality; and (5) Field testing the items and revising to insure \that 
the test is appropriate for the intended student population and \ 
structured to provide meaningful and reliable diagnostic information. 
(Author). 
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GUIDELINES FOR DEVELOPING DIAGNOSTIC TESTS 

i 

Diagnostic testing can serve a variety of Important roles in routine 

classroom pract/ce which can help teachers to enhance their Instructional 

effectiveness and improve student learning. By providing specific 

information about the entrance skills that students have and have not 

acquired, the types of tasks and subtasks they have mastered, and the 

nature of their errors and misconceptions, diagnostic tests are a useful 

tool 1n on-going instructional planning. They can be used: 

o at the beginning of the school year, to identify individual, group 
and/or class needs !n order to prescribe appropriate Instruction; 

o during the school year, to assess areas of instruction where 
individual s'or groups of students are fravlng difficulty and to 
identify specific needs for remediation; 

o throughout the school year, to identify areas where Instructional 
materials and methods were effective and those which are in need of 
modification. 

This view of diagnostic testing is both broader and more narrow than ' 
common definitions. First, it broadens the definition of diagnostic 
testing to include all tests which provide systematic information about 
what skills students have and have not acquired. Second, 1t moves beyond 
individual assessment to encompass both tests which can be used to make 
Instructional decisions about Individual students and tests which can be 
used to guide Instruction for groups of students. It is narrow, however, 
1n that 1t focuses on assessment of academic achievement to Identify 
student strengths and weaknesses and does not consider the range of other 
relevant, non-cogn1t1 ve factors which may elucidate the reasons for%ieir 
performance. This latter focus is not meant to underestimate the 
significance of other factor in students' lear g and Instruction nor 
their importance fn designing effective educational treatments. Diagnosis 
and prescription for Individual students which Ignores student affect, 
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motivation, nutrition, and vision, to name a few potentially relevant 
factors* 1s likely to be found wanting, and these factors must, of course, 
be Included 1n a comprehensive diagnostic system. Nonetheless, a thorough 
understanding of what students can and cannot do, what skills they have and 
have not acquired, and of where there may be gaps In Wlr learning,' is at 
the heart of a sound diagnostic-prescriptive approach. 

' This paper provides a methodology for designing diagnostic tests which 
systematically assess the extent of student learning" and s^eek to locate, 
where appropriate, sources of difficulty within particular skill or content 
areas. The approach is keyed to a teacher's or a curriculum's 
instructional Intentions and considers students' status with respect to 
those Intentions. That 1s, it starts with specific curriculum goals and 
objectives, creates a potential learning map by analyzing the subtasks, 
competencies and/or component skills that are necessary to the achievement 
of the desired objectives, and builds a test to chart students' progress 
with regard to the map. The map not only provides the means for diagnosing 
student difficulties, but also helps to clarify instructional Intentions 
and to target instructional activities. The result 1s instruction which 
systematically teaches students necessary pre-requ1 sites and builds their 
skills to desired levels. Under these conditions, the assessment 1s tied 
directly to tire Instructional context and Its instructional Implications 
are clear. 

How does one accomplish building such a test? The following five 
steps can guide the test development process: f 

1. Develop a blueprint of the skill or content area you want to 
diagnose, I.e., clarify the nature of the skill (s) you Intend to 
assess and the technique 'ou will use to measure students' 
learning; 



3 



2. Develop a map which specifies- the tasks and subtasks that are 
prerequisite to the assessed skill (s); 

3. Write test Items based on the identified blueprint and map, 
utilizing common conventions of 1tem-wr1t1ng; 

4. Review the test Items to confirm their match to the specifications 
and to assure that Items do not contain extraneous complexities, 
unintended cues, or other technical flaws; 

5. Field test the items to determine where item revisions are 
necessary, and/or where the blueprints and maps need to be 
adjusted; to determine whether there is a relationship between the 
hypothesized pre- requisites and the desired objectives; and to 
determine the number of Items required for testing. 

Each of these steps 1s described in the following sections. 
Step One: Develop The Skill Blueprint 

.This first step of tlte test development process often is the most 
arduous. Developing a skill blueprint requires hard thought about the 
nature of the skill that is to be assessed and the nature of item content 
and format which can most appropriately assess its attainment. "Because of 
the effort tnvolved in test development and administration, these skills 
ought to reflect those which require large chunks of instructional time and 
whiclV represent major goals for students for a unit, semester, or year. 

Identify objectives worth testing . The first step within the 
specification process, then, 1s to identify objectives worth testing. A 

•number of screens may be considered 1n determining the most suitable 
\ 

targets of assessment: 

How much Instructional time does it take to teach the objective? As 
mentioned above, you'll want to select objectives that cover a 
reasonable amount of instructional time. 

How does the objective relate to other higher-order skills? Recent 
reports on the status of American education have been critical of 
the level at which some instruction occurs. Be sure that the 
knowledge and skills you are testing and teaching reflect or are 
pre-requ1 sites to Mgher-^evel thinking, problem-sol v1ng skills and 
Important educational goals. 
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How does the objective relate to 1 org- term curricular goals? Like 
the concerns raised above, be sure that the objectives identified 
for assessment are relevant to Important curricular objectives and 
are part of a coherent strand of learning. 

What 1s the intrinsic importance of the objective? Related to all 
of the above, be sure that the objects of assessment reflect 
Important and not trivial learning tasks. 

Specify the skill required to meet the o bjective. After the 
objectlve(s) has been identified, try to clarify the nature of the skill (s) 
that students ^are expected to acquire. What are they supposed to be able 
to do?, e.g., comprehend the main Idea of particular types of texts; solve 
particular types of physics problems^- analyze the causes of particular 
types of world events; analyze particular literary works with regard to 
their plot, characterization, and setting; predict the short and long term 
consequences of particular environmental Intrusions; recall major events of 
the civil war; write an expository essay with certain characteristics, etc. 

Consider the level of cognitive complexity at which students are 
expectec to function. For example, following Bloom (1956), does the skill 
of Interest involve recall, application, analysis, or synthesis? Or, 
following Gagne (1970), does the skill represent concept learning. (concrete 
or abstract), principles, procedures, or problem-sol ving? 

Clarify the content which will* be covered. Consider also the nature 

of the content that needs to be included on the test. You may want to 

examine available curriculum materials as well as your own judgment and 

experience as you consider some of the following questions: 

In how many different contexts will students need to apply the 
skill? for example, 1n the reading example above, will students 
need to use their comprehension skill with expository and narrative 
texts, 1n texts where the main Idea 1s Implicit or explicit? In the 
physics example above, will students need to apply specific physics 
principles 1n laboratory settings, 1n real I1fe-I1ke situations 1n 
space, 1n aircraft, or 1n home situations? In the history example, 
how many and what types of historical events will students be 
required to analyze? ^ 
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What Information wil J students need to know? Is there a list of 
concepts, vocabulary, and facts, that students will be expected to 
acquire? For example, in the science example above, what bones will 
be included in the instructional and test content and what is their 
'function? Or 1$ the civil war example, what types of events are to 
be tonsi dered major? 

How many different topics will be used to test students' skil l 
acquisition? For example, in the expository essay' example above, 
what kinds of prompts will be provided to students? Will they 
include topics which students have directly experienced, topics 
which are" related to those learned 1n other parts of the curriculum,, 
and will they require persuasion and/or description? 

Are there pre-requisite skills that students will need* to acquire? 

The idea, again, is to clarify the nature of the skill or content that is 

to be assessed and diagnosed. 

Select appropriate'ltftm type . Once you feel satisfied that you 
thoroughly understand Vhe skill, consider what Item format m1e)ht be most 
suitable for the assessment and, within the selected format, what types of 
items are most appropriate. 

Consider the range of item formats: selected response items, 
Including true-false, matching, and multiple choice; constructed response, 
including short answer and essay; and performance measures, including 
observation and rating scales. There are no hard and fast rules for 
choosing particular item formats, although there is sometimes an inverse 
relationship between the ease with which an item is constructed and/or 
scored and its measurement validity. For example, although they're easy to 
construct, students have a 50% chance o.f guessing the correct answer to a 
true-false Item. On the other end of the spectrum, although they are quite 
time consuming to score, essay tests provide the best measure of student's 
writing skill and are the only valid alternative where divergent responses 

are desired. i 

With regard to particular types of Items within an item format, 

/ 
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brainstorm some alternatives and choose the one which can best elicit the 
skills 1n which you're interested. Write a couple of items which 
illustrate the kind of Item you have 1n mind. If the Items are to be 
administered via computer, be sure the Item 1s structured to fit within the 
existing constraints, e.g., the number of lines that will fit in a sirrgQ 
screen. ' 

Write the skill blueprint. Once the sample items have been - 
formulated, the skill blueprint can '^e written. Different researchers have 
suggested slightly different formats. The one described below combines 
models suggested by Popham (1980) and Baker (1974) and Includes the 

following components: 

o General description - a brief description of. the objective, skill, 
or knowledge to Jbe measured. 

o Sample Item - aVodel of what test items are to look like, including 
directions to beVjIven to students. * 

o Content limits - a description of the nature of the question that 1s 
to be presented to students. 

o Response limits - for selected response items, a description of the 
response options provided to students or for constructed response • 
items, the set of rules or criteria that are to be used to judge the 
quality of a students' response. 

The first two components are relatively straightforward: they 
Include a Statement of the objective selected for testing and Instruction 
and the sample Item that has been devised for assessing It. Include here 
also the directions that will be given to students. Explicit attention to 
the directions early on helps to assure that they will be clear and that 
students'wlll understand how to complete each Item. 

Content limits describe the range of eligible content from wMch test 
items may be written. They may Incl-ude rules for creating questions, and 
rules for the Inclusion of prompts, cues, or additional materials such as 



pictures, graphs, and xeadlng selections. 

v Content Uro/fts for selected response items define and restrict the 

characteristics, format^ and eligible content to be included 1n the„item 

stem. By systematically including the different situations and contexts 1n 

which the , skills are to be applied and/or the »ules which define the 

assessed skill, test items can provide valuable diagnostic information, v 

such as, 1n what situations are students able\to demonstrate a particular 

skill?, what rules have students mastered? For Instance, for a multiple 

choice item assessing students' skill 1n using appropriate pronoun?, the 

content limits might be formulated as follows: ^ * 

o The it.em stem will present the student with a short (3-5 sentence) 
paragraph which describes an action or event" Involving two or more 
named individuals. 

o A blank will replace the named* individual (s) 1n one sentence. 

o Students will be asked to Identify the pronoun which correctly 
completes the sentence. 

o Items will be written to exemplify the following rules: 



• When the pronoun is the subject of a sentence! or clause, 1t 
should be 1n the nominative case. , ' 

When the pronoun is the direct object, 1t should be 1n the 
objective case. 

When the pronoun is the Indirect object, it should be in the 
objective case. 

(Note that systematical ly Including item's reflecting each rule enables 
a test to diagnose which rule(s) 1s causing students' difficulty; the 
problem of ascertaining the number of items to be written to reflect each 
rule is addressed in a "later section.) 

Content limits for constructed responses define and restrict the 
prompt, the mode of response, and where appropriate, the conditions, 
setting or context surrounding the testing. The content limits' for an 
expository essay task, for example, would specify rules for generating 
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essay prompts and the directions and an/ special cues to be given to 
students. For Instance, 

' o The prompt will present students with a proposition and ask the~ 
student to take a position. 

o The topic presented in the proposition to students must be one with 
which almost all high school students would be familiar, e.g., a 
topic dealing with a situation commonly encountered in daily living 
at home or at school . 

*o The topic must embody an issue on which students would be likely to 
have differing opinions, i.e., in favor or opposed to the 
proposition proposed. 

i 

o * One sentence will provide brief back<ound to the proposition and 
• * will include common reasons supporting the proposition. A second 
, sentence will include common support for the opposing position, 
o These sentences will be labeled: "Background." ' 

o The backround santence will be followed by the assignment ^hich 
consists of the following sentence: VWrite a paragraph 1n which 

you are 1n favor of, or opposed to,* ." Be sure to 

support the position you have taken. 

Response limits provide rules for generating the correct response and 

incorrect alternatives for selected response questions «and rules and 

criteria for judging the quality or correctness of a student's constructed 

response. Like the content limits, response limits help define the range 

of eligible content but here the focus is on student responses: what 

discriminations are expected and reasonable?; what are the characteristics 

of an acceptable response?, wrfat are common misconceptions? for selected 

response items, response limits provide rules for constructing the correct 

answer and the distractors, or wrong answer alternatives for each item. 

These rules should assure distractors that represent commoi student errors 

and which thus may provide important diagnostic Information. For example, 

JS 

response limits for the pronoun example described above might be as 
follows: , s 



o Five alternatives will be provided for each item, the ^jrrect 
answer and four/alternatives. 

/ 

o The correct response*^ 1 1 exemplify the proper application of the 
given rules and will reflect the appropriate gender and number. 

0 Dlstractors will consist of the following: 

- a pronoun In the correct case, but Incorrect 1n number or gender; 

- a pronoun 1n the Incorrect cas? f but correct in number and 

1 gender; 

- a pronoun representing an Incorrect referent, but correct in 
case, number, and gender; 

- a pronoun in the Incorrect case, incorrect in number and/or 
gender. 

With such a set of alternatives, a student' s .wrong answer choice might 
provide Information on whethej; he/she was having difficulty in Identifying 
referents, was confused about case, rules, and/or was having difficulties 
Associated with number aj|d gender. 

For a constructed response Item, response limits provide rules for 
judging or, rating the adequacy of a student's response. Defining response 
limits using a set of concrete criteria maximizes both the diagnostic value 
of the assessment and Its Implications for Instruction. For example, 
response limits for the writing example described above mlght'be as 
follows: 

f 

Student essays will be rated based on their organization, support, 
and mechanics. A five point scale will be used for rating each 
area, with a five designating the high end. 

Organization will be rated as follows: 

.5= essay 1s on topic; the paragraph includes a topic sentence which 
states a position regarding the assigned topic; the essay Includes 
at least three reasons supporting the position; all sentences 1n 
the essay support the topic sentence. 

•4* 

Support will be rated as follows: 

5* 

f 11' 
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The result of this specification process 1s a map for developing test 
items and likewise a map guiding instruction. Not only does the 
specification provide. rules for developing multiple parallel test Items, it 
likewise can be used %p plan 1nstuct1on and" to generate relevant exercises 
for classroom pract1.ce, practice which will help students to acquire the 
specific skills'they are Intended to learn. So, although the process takes 
N some time and effort, there are potential >pay-offs. \ 
Step Two: Specify The Skill Hap \ 

During the first step, the skill which 1s the target of assessment has 
been identified and well specified and a blueprint has been created for 
developing test Items to assess that skill. The specification and the test 
Items 1t Implies, where possible, have been designed to provide diagnostic 
information about students' performance. For example, 1n the pronoun 
example cited above, thc.test Items are to be created^*© assess students' 
attainment of particular rules of pronoun usage and the alternatives have 
been developed to provide Information about whether ^ students are 
experiencing difficulty with referents, case, and number. Likewise, in the 
essay example, scoring rules were created to rate students' writing 1n 
terms of concrete skills of organization, support, and grammar. 

A finer grained diagnosis can be achieved by analyzing the level of 
difficulty at which students are able to operate, and/or the subtasks and 
subskllls which they have mastered enroute to the desired assessed skill. 
In other words, suppose students are not able to correctly perform the 
assessed skill, 1s 1t possible to place them on a continuum from no skill 
through some skill to fully skilled, and how might one define the points on 
the continuum? If one can define the points on the continuum 1n terms of 
specific competencies and/or Identify the relevant skill hierarchy, then 1t 

EMC la 
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1s possible to devise test Items which can appropriately diagnose students' 
skill level. 

How to define the skill continuum or hierarchy 1s a problem tha£ has 

been addressed by a number of researchers, but there are no hard and fast 

rules or breakdowns that are applicable across a range of subject areas. 

Combining logic, theory, and research 1n learning and Instruction, as well 

as practical experience in teaching students the target skill, two 

inter- related strategies may be useful in diagnosing skill level: 

o identify simpler contexts 1n which students' may be able to 
demonstrate the skill and/or simpler tasks which require skills 
similar to the target skill; 

o Identify pre-requi si te skills and knowledge which students would 
need to master 1n order to attain the target skill. 

Identifying simpler contexts/tasks . Several research-based principles 
can aid 1n the Identification of simpler contexts or tasks which can help 
to define Interim points on a skill continuum. These principles are 
Inter-related rather than exclusive and Include linguistic complexity, 
cognitive complexity, and level of discrimination. 

The logic is obvious of using linguistic complexity to help diagnose 
students' .skills in reading. The question 1s, for example, if a student 
cannot comprehend the main Idea of a particular passage, can he/she 
comprehend the same or different passage written at a lower level of 
linguistic complexity? Similarly, in an English example, 1f a student has 
difficulty analyzing the protagonist's character 1n a given story, can 
he/she perform the analysis with a simpler text? (It should be noted that 
when reading skill 1s not the object of assessment, linguistic complexity 
should be controlled, to the extent possible, so that 1t does not influence 
a student's performance; e.g., if a student's math skills were the subject 
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of assessment, the test developer would want to/ keep the language to a 
simple leVel so that reading ability did not Influence performance.) , 

Cognitive complexity 1s a second factor ftfhich can help to define a 
continuum of task difficulty. It has to do with the level of processing 
required by a problem and the number of cognitive steps that a student 
would need to complete. The question is, l.f a student cannot handle one 
level of cognitive complexity, can he/she handle the problem at a simpler 
level of complexity? Consider, for example, the pronoun usage items' 
described above. Students were to be given a short passage and were to be 
asked to identify the pronoun which wcjuld correctly complete a blank within 
the passage. The task requires students to use the context of the passage 
to identify the correct referent and then to match the referent with the 
appropriate pronoun. In\ simpler task, the student might be given a 
sentence in which a subject or object were underlined and asked to identify 
the pronoun which could be substituted for the underlined word. Thus, the 
task would not require students to process the passage to identify the 
referent. Baseo^n students' responses to these tasks, a teacher might be 
able to pinpoint a student's problem as related to Identifying referents 1n 
context. 

Required level of discrimination is a third factor which may be 
helpful 1n tninking about difficulty. Some tasks require fine levels of 
discrimination among concepts and topics while in other task only gross 
discrimination are necessary. For example, consider the following two 
items which ask students to identify a triangle from a set of alternatives 
{from Baker and Herman, 1983): Vv 
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Both examples assess students' understanding of the concept "triangl/," but 
the secorftkcl early requires finer discriminations. To mark the correct 
answer 1n the) second example, students must be able to apply features of 
three-si dedn/ss, closed, figure, linear figure, while in the first example 
knowing th^t a triangle 1s a geometric figure is sufficient Information to 
arrive at a correct answer and any one of the three defining features of a , 
triangle may be used to respond correctly. 

Two other examples -ill ustrate-the- notion of discrimination. Consider 
the question, "Which country is more democratic, Italy or France?" vs. 
"Which country is more democratic, the United States or the USSR?" Imagine 
also two literary analysis problems which ask students to describe the 
theme of given works. In one work, there 1s a unitary theme\fh1ch 1s 
obvious; in the second, there are several sub- themes and tht central one Is 

less salient. , 

Closely related to. the level of required discrimination Isjthe level 
of prompting, and/or salient cues given to the student aboiit what he/she 
is supposed to do. Suppose, for example, that a student /is given I1fe-I1lte 
problems which he/she 1s supposed to solve using principles of physics, but 
the problems are silent on which prlndple(s) apply. A simpler version 
might prosnpt the student on what principle to use for/each problem. 

IdentifyingjjPrerequ1s1te Skills and Knowledge . /The previous section 
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has discussed potential strategies for simplifying the context 1n which 
students apply their skill in order to diagnose the level at which they are 

« 

able to operate. - A complementary (and inter-related) approach to the 
diagnostic problem is to consider simplifying the skills which students are' 
asked to demonstrate and to try to locate their' performance within a 
hierarchy of prerequisites identifying gaps where Instruction 1s needed. • 
For example, in the reading comprehension example cited above, students 
were asked to identify the main Idea of given passages.' Skills 
prerequisite to comprehending the main idea and which might be -assessed are' 
comprehending details of the passage, understanding the specific vogab'ulary 

h 

used in the passage, and s$ on 1 . 

The Iterative question which needs to be addressed 1n Identifying t 
prerequisite skills 1s "What does a student need to be able, to do 1n order 
to attain a given skill? What subskllls does he/she most likely need to 
learn enroute to the desired skill?" Examples easily come to mind in the 
area of mathematics, (cf. Gagne, 1977).' In order to subtract whole numbers 
of any size, a student would need to be able to, in ascending orW Df 
difficulty, to subtract without borrowing, to subtract when several 
* borrowings are required 1n non-adjacent columns, and when successive 

borrowings are required from adjacent columns. 

In order to apply task analyses 1n other areas of the curriculum, 
think about the nature of the skill you are assessing. What rules, ^ 
procedures, and/or principles does a student need ta 'know 1n' order to 

* i 

attain the skill? What concepts does he/she need to* understand? Are there 

particular facts that need to be accessible? Each of these represent 

potential diagnostic points on the skill hierarchy. 

T he skill map . Use the above strategies, combined with practical 

knowledoe about likely and/or common sources of students' problems and 
ERIC ' 17 
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errors to develop a map, or continuum, which can guide diagnostic testing, 
How many* subtasks or subskllls should be included 1n the map? On the one 
hand, the more that are included the greater the diagnostic potential of 
the test. On the other hand, each diagnostic point adds greatly to test 
development and administration time and it may not be feasible to use more 
finegrained information in Instruction. For example, it may be faster and 

» » 

logistlcally easier to carefully reteach the skill to a group than to 
painstakingly uncoveY the unique problems of each student. The number of 

' i 

diagnostic points Included on the test, then,4l1ll relate to both 
feasibility and potential utility. Probably these two conditions will 
suggest that a couple of points reflecting common levels of student 

r 

performance are reasonable for assessment. v f 

Once the diagnostic assessment points' have been specified on the skill 
map, then the types of items that will measure each point need to be 
Identified. Ideally, this process mirrors the process described above for 
developing a skill blueprint, with blueprints being devised for each 
subsklll and/or subtask. Time constraints, however, may limit the level of 
detail Included. 
Step Three: Develop Test Itews 

Once the skill blueprints have been specified, developing the test 
j^ems 1s a matter of simply follow^ the specified rules. How many Items 
need to be created? The statistical analysis conducted late during step 
five will provide a good estimate of the number of Items that will need to 
be Included on the final version of, the test. At this preliminary stage, 
however, the answer is "as many as possible," and at least three to five 
Items for each diagnostic point on the test, I.e. 4 , 3-5 Items for each 
subsklll and for each rule and/or task context Included within the skill 
blueprints. 18 
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In addition to following the test specifications, Item writers will 
also want to keep 1n mind conventional rules of thumb which help prevent 

the Inclusion of extraneous factors that can confound an examinee's 

•» 

response. These ryles concentrate on factors such as linguistic, semantic, 
and grammatical features that may enable an unknowing student to give a 
correct response or that may prevent a knowing student from responding 
correctly. Gronlund (1968) -and Conoley and O'Nell (1979) provide a 
thorough explication of such rules. Typical rules are summarized in 
Figure 1. 

Step Four: Review the Test Items Once Items are developed, the next 

step 1s to conduct a thorough review, considering two basic questions: 

o Do the Items match their specification? 

o Are they free from technical flaws, I.e., do they follow 
conventional rules 'of Item construction? 

Do the Items match their specification? The answer to this question 

1s critical to establish the content validity of the test. The process 1s 

straightforward: have each Item examined by a colleague to compare Its 

match with each element 1n the specification. That 1s, the description of 

eligible subject matter and Item features provided 1n the content limits 

needs to be chared with the content and features of the test question; 

and the specification rules for creating correct and Incorrect answer 

alternatives must be compared with the actual set provided 1n selected 

response Items. The Items should be checked also to see that they folldw , 

the prescribed format and that appropriate directions are given. While 

covered again under "technical flaws," check also to assure that the 

language used 1n the Items 1s not unnecessarily dlfflcu t or complex and 

that Items are free from content that might be biased ayalnst particular 

groups of students. Where any problems are encountered, suitable Item 
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Figure 1 

General Guidelines for Item Writing 

Typical Rules for Multiple Choice Items ; 

1. The stem of the items should pe meaningful by itself and should present 
a clear problem.. » » 

2. The stem should be free from Irrelevant material . ' 

3. ' The stem should include as much of the item as possible except where an 

inclusion would clue the responses. Repetitive phrases should be 
included in the stem rather than being restated in each alternative. 

4. All alternatives should be grammatically consistent with the item stem 
and of similar length, so as not to provide a clue to. the answer. 

5. An item should include only one correct or clearly best answer. 

6. Items used to measure understanding should contain some novelty and not 
merely repeat verbatim materials or problems presented in instruction. 

7. All distractors should be plausible and related to the body of 
knowledge and learning experiences measured. 

8. Verbal associations between the stem and correct answer or stereotyped ' 
phrases should be' avoided. 

9. The correct answer should appear in each of the alternative positions 
with approximately equal frequency and in random order. 

10. Special alternatives such as "none," "all of* the above" should be used 
sparingly. 

11. Avoid items that contain inclusive terms (e.g., "never," "always," 
"all"! in the wrong answer. 

12. Negatively stated item stems should be used sparingly. 

13. Avoid alternatives that are opposltive in meaning or that are 
paraphrases of each other. 

14. Avoid items which ask for opinions. 

15. Avoid items that contain irrelevant sources of difficulty, such as 
vocabulary, sentence structure. 

16. Avoid interlocking items, items whose answers clue responses to 
subsequent items. j 

1 17. Don't use multiple choice items where other Item formats are more 
appropriate. \ 

20 
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Figure J (continued) 
Typical Rules for Short Answer and Completion Items : 

1. A direct question is generally better than an Incomplete statement. 

2. Word the Item so that the required answer is both brief and 
unambiguous. 

3. Where an answer is to be expressed in numerical units, indicate the 
type of units wanted. r 

4. Blanks for answers should be equal in length. Scoring is facilitated 
if the blanks are provided 1n a column to the right of the question. 

5. No grammatical cues should be give, e.g. a ; an 

, w . 

i 

6. Where completion items are used, do not leave too many blanks. 

7. For completion items, only key words should be left blank. Leave blank 
only those things that are Important to remember. 

8. In composing items, don't take statements verbatim from students' 
textbook or instruction. " 8 

9. The scoring key should anticipate possible synonyms or acceptable 
variants at the desired response. 



Typical Rules for True-False or Alternative Response Items ; 

-T-. - . . -- ^ 

1„ Avoid broad general statement* for true-false items. 

2. Avoid trivial statements., 

3. Avoid negative statements and especially double negatives. 
*4. Avoid long complex sentences. 

5. Avoid including two Ideas 1n a single statement unless cause-effect 
relationships are being measured. 

6. Avoid questions which include Indefinite terms, degrees or amounts, 

7. Include opinion statements only if they are attibuted to particular 
sources. 

8. True statements and false statements should be approximately the same 
length. 

9. The number of true statements and of false statements should be 
approximately equal . 

lo/ Avoid taking statements verbatim from students' text or instruction. 

11. An item's truth or falsity should not depend on an Insignificant word 
or phrase. 
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revisions need to be made, e.g., changing language, reducing ambiguity, 
changing the Item stem or alternatives to match the blueprint. (In some 
cases where unanticipated problems emerge, there may be Instances where the 
blueprint needs to be changed and/or modified.) 

Are the items free from technical flaws? The review process here is 
also straightforward. Simply check the items against the general rules for 
constructing test items of particular types, and where flaws are detected; 
correct them. As with the content review described above, it is preferable 
to have^he review conducted by a colleague, yielding the advantage of 
having a "cold," objective eye. 

Step Five: Field Test the Items 

Field testing the items is a final step in the test development 
process to assure high quality items, to verify the test structure, and to 
determine the number of items that will be needed to reliably diagnose 
students' performance. The' optimal field test procedures involve a two 
stage process: 1) pilot test the item* with a small sample of students to 
check their appropriateness; 2) administer the test to a larger sample to 
validate the subskllls that need to be Included 1n the test and the number 
of items required fo/each skill and subsklll. 

The initial pilot test. The purpose of the first pilot test 1s to 
determine whether the Items are appropriate for students and to Identify 
items that are potentially 1n need of revision. Have a small number of 
students who are similar to the intended student population take the entire 
test and provide feedback on any problems they encounter, e.g., vocabulary 
or directions that are unclear; items where there seem to be more than one 
(or no) right answers. This feedback helps Indicate where revisions are 
necessary. 

Item difficulty Indices (the percent of students who answer an item 
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correctly") also help signal potential problem items. Because they are 
based on the same blueprint, one would expect similar item difficulties for 
all Items measuring the same subsklll or tas^. Gross deviations indicate, 
Items which need additional review. For exampj-e, suppose that Item 
difficulties for four of the five Items measuring a particular subskill are 
.5-. 7; however, the difficulty of the fifth Item is .25. This latter item 
should be re-examined to determine whether it is\ aberrant and is 
unintentionally confusing the correct response, whether 1t matches the 
specification, whether it represents a problem type that is different from 
the oth items, whether the correct answer has been miskeyed, and/or 
whether there are typographical or other errors 1n the Items. Any detected 
errors or deviations will heed to be corrected. Item difficulties can also" 
be used to help judge the appropriateness of the test for particular 
students. In order to be useful 1n a diagnostic sense, a test should 
measure target skills which are difficult for a substantial number of 
students: If all or most students get all or most of the Items correct, 
there is little to diagnose. 

The field test .J» Once the Initial piloting has been completed and 
revisions made, the revised version of the test needs to be field-tested by 
administering 1t to a larger sample of students (at least 100 per student- 
population). Student performance on this field-test should then be 
analyzed to establish the technical characteristics of the test and to 
direct further the revision process. While a thorough description of 
appropriate analytic procedures 1s outside the scope of. this report, the 
use of generalizing analyses is recommended for the field test analysis. 
Although such analyses are complex and will require the services of an 
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expert statistician, they provide, full information on the structure and 
reliability of the test. Two types of general izabll 1ty analyses, 
identified below, are recommended. A brief rationale for these analyses 

y — 

follow: i 

o general 1zabi 11 ty analyses to analyze the structure of the test and 
* . to determine which subskills and tasks have distinct diagnostic 
value and to verify hierarchical relationships amonf skills and 
sub-skills; 

i o separate general 1zab1lity. analyses for each skill and.jtask in the 
profile to determine the number of items that should be included 1n 
the test to obtain a reliable measure of those skills. 

General Izability analyses related to structure. The content and/or 

skill dimensions Included on the test reflect hypotheses about what causes 

students' performance to vary 1n a particular skill area, and about why 

some students score very highly and others do not. These hypotheses are 

validated 1f one, can demonstrate that student performance within a. 

dimension 1s relatively consistent and reflects a^nlform (suu)sklll, but 

Is Inconsistent, or varies, across dimensions. Under these conditions, a 

particular student's total score is "explained" by his/her subskills, e.g., 

a student performs at%a certain level because he/she scores consistently 

well 1n some (or all) .subskills and 1s consistently unable to perform 

v , 

others. These latter skills represent those 1n need of remediation. 

A content or skill dimension (Including rules and contexts within the 
skill blueprint and tasks and subskills included in the skill map) has 
diagnostic utility and needs to be represented on a test if 1t demonstrates 
such explanatory power. In the absence of such power, knowledge about 
student performance on the dimension provides little additional Information 
to teachers. That 1s, if students' performance Is inconsistent within an' 
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area, then this area does not represent a single dlstjffnct skill. 0r,1f 
students perform at the same level on all dTmensions, then there 1s no need 
to profile them separately or to provide separate scores. \ 

General 1zab 11 ity analyses can be used to determine which' of the 
dimensions included on the test have explanatory power and therefore should 
be retained as separate subskllls. The analysis treats each dimension as a 
separate factor and examines the amount of variance 1t contributes to the 
total score. While there is no rule of thumb about' what proportion of 
variance represents a large amount, some researchers have recommended 
3.5^6% as a cut-off. The decision Involves a trade-off between cost and 
Information. Using & small proportion as a minimum may produce more 
detailed skill profiles than are necessary. Using a large proportion as a 
minimum, on the other hand, may cause Important sources of student problems 
to be overlooked or^dlsregarded. v 

General izabU ity analyses related to number of Items. 
General Izabil ity analyses can also be used to determine the optimal number 
of items to include for each content or skill dimension* covered on the 
test. The analytic question 1s "how many items are needed to provide a 
general Izable or reliable measure of student performance?" and separate 
analyses are conducted for each content or skill dimension. Like the 
analyses above, there is no firm rnjle, of thumb for how reliable or 
consistent a score needs to be, although coefficients of .6-, 7 are common. 
(See Webb et al , 1983 for a fuller explantion of the use of 
general izabil 1ty analysis.) 0 *' 

Based on these analyses, the final* diagnostic test can be constructed, 
reflecting the structure and Item requirements Indicated by the above 
analyses. ' 
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Diagnostic testing can provide specific Information about student 
skills as a dedsl on-making aid to teachers 1n prescribing Instruction, 
identifying needs. £gr remediation, determining effective instructional * 
materials and methods, 'and ultimately, .Improving student learning. 
Diagnostic testing, as viewed here, Includes Individual and group 
assessment of students' skills 1n, specified cognitive domains. A 
methodology 1s presented for designing diagnose tests which assess the 
extent of student learning and are sensitive to sources of difficulty 
within a skill or context area. This 5-step methodology for diagnostic 
test development includes: 

1) Developing a skill blueprint Including a general description of 
' the- objective or skill, a sample item, content Uml'ts, and response 

limits; 

2) Specifying the skill map including sut^-skllls or simpler contexts 
which students should master) enroute to the desired skill under 
assessment; 

3) Formulating test Items that match specifications and follow 
conventions for sound w1t&m-wj*1t1ng; 

4) Reviewing test items to Insure match to specifications and 
technical quality; 

5) Field testing the. items and revising to Insure that the test 1s 
appropriate for the Intended student population and structured to 
provide meaningful and ( rel 1 able diagnostic Information. 
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